Supervised Ranking for Plagiarism Source Retrieval
نویسندگان
چکیده
Source retrieval involves making use of a search engine to retrieve candidate sources of plagiarism for a given suspicious document so that more accurate comparisons can be made. We describe a strategy for source retrieval that uses a supervised method to classify and rank search engine results as potential sources of plagiarism without retrieving the documents themselves. Evaluation shows the performance of our approach, which achieved the highest precision (0.57) and F1 score (0.47) in the 2014 PAN Source Retrieval task.
منابع مشابه
Source Retrieval Based on Learning to Rank and Text Alignment Based on Plagiarism Type Recognition for Plagiarism Detection
This paper regards the query keywords selection problem in source retrieval as learning a ranking model to choose the method of keywords extraction over suspicious document segments. Four basic methods are used in our ranking function: BM25, TFIDF, TF and EW. Then, a ranking model based on Ranking SVM is proposed to rank the query keywords group which is contributed to get the higher evaluation...
متن کاملUnsupervised Ranking for Plagiarism Source Retrieval Notebook for PAN at CLEF 2013
The source retrieval task for plagiarism detection involves the use of a search engine to retrieve candidate sources of plagiarism for a suspicious document and provides a way to efficiently identify candidate documents so that more accurate comparisons can take place. We describe a strategy for source retrieval that makes use of an unsupervised ranking method to rank the results returned by a ...
متن کاملApproaches for Candidate Document Retrieval and Detailed Comparison of Plagiarism Detection
In this paper we report on our plagiarism detection system which is used to process the PAN plagiarism corpus for the tasks of Candidate Document Retrieval and Detailed Comparison. To retrieve the plagiarism candidate document by using ChatNoir API, a method based on tf*idf to extract the keywords of suspicious documents as queries is proposed. An Lucene ranking method is used for plagiarism ca...
متن کاملPlagiarism Detection Using Information Retrieval and Similarity Measures Based on Image Processing Techniques - Lab Report for PAN at CLEF 2010
This paper describes the Barcelona Media Innovation Center participation in the 2nd International Competition on Plagiarism Detection. Particularly, our system focused on the external plagiarism detection task, which assumes the source documents are available. We present a two-step a approach. In the first step of our method, we build an information retrieval system based on Solr/Lucene, segmen...
متن کاملApproaches for Source Retrieval and Text Alignment of Plagiarism Detection Notebook for PAN at CLEF 2013
In this paper, we describe our approach at the PAN@CLEF2013 plagiarism detection competition. In sub-task of Source Retrieval, a method combined TF-IDF, PatTree and Weighted TF-IDF to extract the keywords of suspicious documents as queries to retrieve the plagiarism source document is proposed. In sub-task of Text Alignment, a method based on sentence similarity is presented. Our text alignment...
متن کامل